20099: Add Decimal support for floor preimage by martin-augment · Pull Request #224 · martin-augment/datafusion

martin-augment · 2026-02-02T08:18:50Z

20099: To review by AI

gemini-code-assist · 2026-02-02T08:19:10Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the floor function's optimization capabilities by introducing support for Decimal data types in its preimage calculation. This change allows the query optimizer to convert floor(decimal_column) = N conditions into more performant range filters, such as decimal_column >= N AND decimal_column < N + 1. The implementation includes a new utility function for decimal bound computation and is thoroughly validated with both unit tests and SQL logic tests to ensure correctness and proper query plan generation.

Highlights

Decimal Support for Floor Preimage: Extended the floor function's preimage optimization to support all Decimal data types (Decimal32, Decimal64, Decimal128, Decimal256). This allows predicates like floor(decimal_col) = N to be transformed into efficient range scans (decimal_col >= N AND decimal_col < N + 1).
New decimal_preimage_bounds Function: Introduced a new helper function decimal_preimage_bounds to calculate the lower and upper bounds for the floor preimage of decimal values, handling fractional parts (returning None) and potential overflow during bound calculation.
Comprehensive Unit Tests: Added extensive unit tests for the new decimal preimage logic, covering valid integer-like decimal inputs, non-integer decimals, overflow scenarios, edge cases (min/max values), and null inputs across all Decimal types.
SQL Logic Test Verification: A new SQL logic test file (floor_preimage.slt) was added to verify the optimization in the query planner for Decimal128, alongside existing Float64 and Int32 tests. This includes EXPLAIN plans to confirm the predicate transformation for various comparison operators.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-02T08:19:17Z

Walkthrough

This change adds support for decimal types (Decimal32, Decimal64, Decimal128, Decimal256) to the floor preimage optimization logic in DataFusion. A new helper function decimal_preimage_bounds computes preimage ranges for decimal values with proper handling of precision, scale, and overflow. The core floor matching logic is extended to handle decimal scalar values. Additionally, a new sqllogictest file verifies the floor preimage transformation works correctly across different data types, comparison operators, and edge cases, including NULL handling and non-optimizable predicates.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-20099-2026-02-02-08-18-45

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-02-02T08:20:20Z

Code Review: Add Decimal support for floor preimage

Summary

This PR adds decimal type support (Decimal32, Decimal64, Decimal128, Decimal256) to the floor function's preimage optimization. The implementation looks solid overall with comprehensive test coverage. Here are my findings:

✅ Strengths

Comprehensive Test Coverage: Excellent unit tests covering all decimal types with edge cases (overflow, null, non-integer, min/max values)
Consistent Pattern: The implementation follows the existing pattern for float and integer types
Good Documentation: The SQL logic tests are well-documented and cover various comparison operators
Correctness: The decimal preimage logic correctly implements the mathematical constraint that floor always returns integers

🔍 Code Quality & Best Practices

1. Import Organization (datafusion/functions/src/math/floor.rs:22-27)

The imports are clean, but consider alphabetical ordering within groups:

use arrow::compute::{DecimalCast, rescale_decimal};
use arrow::datatypes::{
    ArrowNativeTypeOp, DataType, Decimal32Type, Decimal64Type, Decimal128Type,
    Decimal256Type, DecimalType, Float32Type, Float64Type,
};

2. Potential for Code Deduplication (datafusion/functions/src/math/floor.rs:262-301)

The four decimal type cases are nearly identical. While the current approach is clear and explicit, consider if a macro or generic helper could reduce repetition. However, the current explicit approach is acceptable for maintainability.

🐛 Potential Issues

1. Fractional Part Check Logic (datafusion/functions/src/math/floor.rs:378-379)

if scale > 0 && value % one_scaled != D::Native::ZERO {
    return None;
}

Concern: This check only applies when scale > 0, but what about when scale < 0? According to SQL standards, decimal types can have negative scales (e.g., DECIMAL(10, -2) means rounding to hundreds).

Questions:

Does Arrow support negative scales for decimals?
If so, should the fractional check be skipped or handled differently?
The test cases only use scale >= 0, so this edge case is untested

Recommendation: Either:

Add a comment explaining that negative scales are not supported/expected
Add validation or handling for negative scales
Add test coverage for scale = 0 and scale < 0 cases

2. Error Handling in rescale_decimal (datafusion/functions/src/math/floor.rs:368-373)

let one_scaled: D::Native = rescale_decimal::<D, D>(
    D::Native::ONE,
    1,
    0,
    precision,
    scale,
)?;

Observation: rescale_decimal returns Option, which is propagated with ?. This is correct, but it's worth documenting when this can fail:

Invalid precision/scale combinations
Overflow during scaling

Recommendation: Add a comment explaining failure conditions for future maintainers.

⚡ Performance Considerations

Modulo Operation: The fractional part check uses modulo (value % one_scaled), which is generally efficient for integer types. No concerns here.
rescale_decimal Call: This is called once per preimage computation, which is acceptable since this is an optimization path that runs during query planning, not query execution.

🔒 Security Concerns

No security issues identified. The implementation:

Uses checked arithmetic (add_checked) to prevent overflow
Properly handles null values
Returns None for invalid inputs rather than panicking

🧪 Test Coverage

Excellent coverage with the following test categories:

✅ Valid cases for all decimal types
✅ Non-integer values (should return None)
✅ Overflow cases
✅ Edge cases (min/max values)
✅ Null handling
✅ All comparison operators (=, <>, <, >, <=, >=)
✅ IS DISTINCT FROM / IS NOT DISTINCT FROM

Potential additions:

Test with scale = 0 explicitly (currently in edge cases but worth a dedicated test)
Test with very high scales (e.g., scale = 38 for Decimal128)
Integration test showing the optimization actually improves query performance

📝 Documentation & Comments

Good:

Function documentation is clear
Test comments explain the mathematical constraints

Could improve:

Add inline comment explaining why fractional values return None (datafusion/functions/src/math/floor.rs:378)
Document the rescale_decimal approach in the function comment

🎯 Specific Recommendations

Address the scale < 0 edge case or add documentation

Consider adding a comment in decimal_preimage_bounds explaining:

/// Compute preimage bounds for floor function on decimal types.
/// For floor(x) = n, the preimage is [n, n+1).
/// 
/// Returns None if:
/// - The value has a fractional part (floor always returns integers)
/// - Adding 1 would overflow
/// - rescale_decimal fails (invalid precision/scale)

Minor: Test naming consistency - some tests use decimal prefix, others don't. Consider consistent naming like:
- test_floor_preimage_decimal32_*
- test_floor_preimage_decimal64_*
  etc.

✅ Final Verdict

This is a high-quality PR that correctly implements decimal support for floor preimage optimization. The code is well-tested, follows existing patterns, and handles edge cases properly.

Recommended Action: Approve with minor suggestions for documentation improvements and clarification on negative scale handling.

Nice work! 🎉

Minor Nits

Line 262-301: Consider adding blank lines between different decimal type cases for readability
Test file header (datafusion/sqllogictest/test_files/floor_preimage.slt:18-25): Well-structured and informative

gemini-code-assist

Code Review

This pull request adds support for Decimal types to the floor function's preimage optimization, which is a great enhancement. The implementation looks correct, and the added unit tests and sqllogictest are comprehensive.

I have a few suggestions to improve code clarity and maintainability, mainly around reducing code duplication in the preimage function and improving the clarity of some test cases by using more idiomatic arithmetic operators. Overall, this is a solid contribution.

gemini-code-assist · 2026-02-02T08:21:20Z

datafusion/functions/src/math/floor.rs

+            // Decimal types
+            ScalarValue::Decimal32(Some(n), precision, scale) => {
+                decimal_preimage_bounds::<Decimal32Type>(*n, *precision, *scale).map(
+                    |(lo, hi)| {
+                        (
+                            ScalarValue::Decimal32(Some(lo), *precision, *scale),
+                            ScalarValue::Decimal32(Some(hi), *precision, *scale),
+                        )
+                    },
+                )
+            }
+            ScalarValue::Decimal64(Some(n), precision, scale) => {
+                decimal_preimage_bounds::<Decimal64Type>(*n, *precision, *scale).map(
+                    |(lo, hi)| {
+                        (
+                            ScalarValue::Decimal64(Some(lo), *precision, *scale),
+                            ScalarValue::Decimal64(Some(hi), *precision, *scale),
+                        )
+                    },
+                )
+            }
+            ScalarValue::Decimal128(Some(n), precision, scale) => {
+                decimal_preimage_bounds::<Decimal128Type>(*n, *precision, *scale).map(
+                    |(lo, hi)| {
+                        (
+                            ScalarValue::Decimal128(Some(lo), *precision, *scale),
+                            ScalarValue::Decimal128(Some(hi), *precision, *scale),
+                        )
+                    },
+                )
+            }
+            ScalarValue::Decimal256(Some(n), precision, scale) => {
+                decimal_preimage_bounds::<Decimal256Type>(*n, *precision, *scale).map(
+                    |(lo, hi)| {
+                        (
+                            ScalarValue::Decimal256(Some(lo), *precision, *scale),
+                            ScalarValue::Decimal256(Some(hi), *precision, *scale),
+                        )
+                    },
+                )
+            }


These match arms for Decimal32, Decimal64, Decimal128, and Decimal256 are very repetitive. This code duplication could be reduced to improve maintainability.

Consider defining a local macro within the preimage function to handle the repetitive logic. For example:

macro_rules! handle_decimal { ($T:ty, $n:expr, $p:expr, $s:expr, $scalar_variant:path) => { decimal_preimage_bounds::<$T>(*$n, *$p, *$s).map(|(lo, hi)| { ($scalar_variant(Some(lo), *$p, *$s), $scalar_variant(Some(hi), *$p, *$s)) }) }; }

Then each match arm could be simplified to a single line, e.g.:
ScalarValue::Decimal32(Some(n), precision, scale) => handle_decimal!(Decimal32Type, n, precision, scale, ScalarValue::Decimal32)

Since adding the macro definition is outside the changed lines, I'm not providing a direct code suggestion, but this refactoring would make the code more concise.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The match arms are identical and differ only in the type of the handled Decimal type. Prevents code duplication and double maintenance.

gemini-code-assist · 2026-02-02T08:21:20Z

datafusion/functions/src/math/floor.rs

+
+        // Decimal256: i256::MAX
+        assert_preimage_none(ScalarValue::Decimal256(
+            Some(i256::MAX.wrapping_sub(i256::from(50))),


Using wrapping_sub here can be misleading as the subtraction is not expected to wrap. Using the standard subtraction operator - would be clearer and more accurately reflect the test's intent, as i256 implements Sub.

Suggested change

Some(i256::MAX.wrapping_sub(i256::from(50))),

Some(i256::MAX - i256::from(50)),

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! There is no implementation of std::ops::Sub for i256 in Apache Arrow

gemini-code-assist · 2026-02-02T08:21:20Z

datafusion/functions/src/math/floor.rs

+        assert_preimage_range(
+            ScalarValue::Decimal256(Some(large_256), 76, 2),
+            ScalarValue::Decimal256(Some(large_256), 76, 2),
+            ScalarValue::Decimal256(Some(large_256.wrapping_add(i256::from(100))), 76, 2),


For clarity and consistency with decimal_preimage_bounds which uses add_checked, it's better to use the + operator here instead of wrapping_add. Since this test case is for non-overflowing values, + better reflects the intent.

Suggested change

ScalarValue::Decimal256(Some(large_256.wrapping_add(i256::from(100))), 76, 2),

ScalarValue::Decimal256(Some(large_256 + i256::from(100)), 76, 2),

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! There is no implementation of std::ops::Add for i256 in Apache Arrow

gemini-code-assist · 2026-02-02T08:21:20Z

datafusion/functions/src/math/floor.rs

+        assert_preimage_range(
+            ScalarValue::Decimal256(Some(neg_256), 76, 2),
+            ScalarValue::Decimal256(Some(neg_256), 76, 2),
+            ScalarValue::Decimal256(Some(neg_256.wrapping_add(i256::from(100))), 76, 2),


Similar to the previous case, using + instead of wrapping_add would be clearer here, as this is a non-overflowing test case.

Suggested change

ScalarValue::Decimal256(Some(neg_256.wrapping_add(i256::from(100))), 76, 2),

ScalarValue::Decimal256(Some(neg_256 + i256::from(100)), 76, 2),

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! There is no implementation of std::ops::Add for i256 in Apache Arrow

augmentcode · 2026-02-02T08:25:31Z

🤖 Augment PR Summary

Summary: Extends the floor scalar UDF preimage logic to support Arrow Decimal types so predicate simplification can be applied to decimal comparisons.

Changes:

Adds Decimal32/64/128/256 handling in FloorFunc::preimage to derive [N, N+1) bounds.
Introduces a shared decimal_preimage_bounds helper that computes “+1” at the correct scale via rescale_decimal and uses checked arithmetic for overflow safety.
Adds comprehensive unit tests covering valid decimal ranges, non-integer decimal literals, null literals, and boundary conditions.
Adds a new SQLLogicTest file validating both correctness and EXPLAIN-plan rewrites for Float64/Int32/Decimal predicates.

Technical Notes: Decimal preimage optimization is only applied when the compared literal is an integer value at its declared scale and N + 1 can be represented without overflow.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-02-02T08:25:32Z

datafusion/functions/src/math/floor.rs

+
+        // Decimal32: i32::MAX
+        // For scale=2, we add 100, so i32::MAX - 50 would overflow
+        assert_preimage_none(ScalarValue::Decimal32(Some(i32::MAX - 50), 9, 2));


These scale=2 “overflow” cases (e.g. i32::MAX - 50) aren’t aligned to the scale factor (100), so preimage returns None due to the “non-integer decimal” check rather than actually exercising the add_checked overflow path.

Other Locations

datafusion/functions/src/math/floor.rs:684

datafusion/functions/src/math/floor.rs:688

datafusion/functions/src/math/floor.rs:692

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! i32::MAX - 50 gives 2147483597, which has 10 digits and does not fit into precision=9 at all. So, the decimal number is overflown by construction and there is no call to floor() at all

martin-augment · 2026-02-02T13:16:13Z

1. Fractional Part Check Logic (datafusion/functions/src/math/floor.rs:378-379)
if scale > 0 && value % one_scaled != D::Native::ZERO {
    return None;
}
Concern: This check only applies when scale > 0, but what about when scale < 0? According to SQL standards, decimal types can have negative scales (e.g., DECIMAL(10, -2) means rounding to hundreds).

Questions:

Does Arrow support negative scales for decimals?

If so, should the fractional check be skipped or handled differently?

The test cases only use scale >= 0, so this edge case is untested

Recommendation: Either:

Add a comment explaining that negative scales are not supported/expected

Add validation or handling for negative scales

Add test coverage for scale = 0 and scale < 0 cases

value:useful; category:bug; feedback: The Claude AI reviewer is correct! Apache Arrow supports negative scales for decimals, so this check won't detect such decimals with fractional part. Prevents a bug with decimals with negative scales. Also shows that there are no unit/slt tests for such.

Devanshu added 4 commits February 1, 2026 16:21

Add decimal support

5ff1cd2

Update for other decimal types

77e3e4c

SLT tests

d32651d

Add SLT tests for other operator types

1830ace

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

augmentcode bot reviewed Feb 2, 2026

View reviewed changes

	Some(i256::MAX.wrapping_sub(i256::from(50))),
	Some(i256::MAX - i256::from(50)),

	ScalarValue::Decimal256(Some(large_256.wrapping_add(i256::from(100))), 76, 2),
	ScalarValue::Decimal256(Some(large_256 + i256::from(100)), 76, 2),

	ScalarValue::Decimal256(Some(neg_256.wrapping_add(i256::from(100))), 76, 2),
	ScalarValue::Decimal256(Some(neg_256 + i256::from(100)), 76, 2),

Conversation

martin-augment commented Feb 2, 2026

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Feb 2, 2026

Code Review: Add Decimal support for floor preimage

Summary

✅ Strengths

🔍 Code Quality & Best Practices

1. Import Organization (datafusion/functions/src/math/floor.rs:22-27)

2. Potential for Code Deduplication (datafusion/functions/src/math/floor.rs:262-301)

🐛 Potential Issues

1. Fractional Part Check Logic (datafusion/functions/src/math/floor.rs:378-379)

2. Error Handling in rescale_decimal (datafusion/functions/src/math/floor.rs:368-373)

⚡ Performance Considerations

🔒 Security Concerns

🧪 Test Coverage

📝 Documentation & Comments

🎯 Specific Recommendations

✅ Final Verdict

Minor Nits

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Feb 2, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Feb 2, 2026

1. Fractional Part Check Logic (datafusion/functions/src/math/floor.rs:378-379)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 2, 2026 •

edited

Loading